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. _ This article examines density estimation by combining a para- 

^si , metric approach with a nonparametric factor. The plug-in paramet- 

ric estimator is seen as a crude estimator of the true density and is 
adjusted by a nonparametric factor. The nonparametric factor is de- 
pH , rived by a criterion called local Z/2-fitting. A class of estimators that 

^0 ' have multiplicative adjustment is provided, including estimators pro- 

posed by several authors as special cases, and the asymptotic theories 
are developed. Theoretical comparison reveals that the estimators in 
this class are better than, or at least competitive with, the traditional 
kernel estimator in a broad class of densities. The asymptotically best 
estimator in this class can be obtained from the elegant feature of the 
bias function. 



•3 



> 

C^ . 1. Introduction. Smoothing is a very important area of statistical anal- 

J^ | ysis and has a wide range of applications in mathematical sciences. The 

^j£> ■ present article is concerned especially with density estimation. Let X±, . . . , X n 

be independently and identically distributed with density /. The problem 

is in estimating the density function / from the data. In considering this 

problem, two approaches exist. 

The first is called the parametric approach. In this approach, we prepare 

a parametric model 



in 

O 
^T 

O 

^— > 

S 
> 

X 



{g(x,9):9eQ}, 



where 9 is a p-dimensional parameter vector and is the parameter space 
5^ ' in W. In practice the family of densities is constructed from previous expe- 

rience and preanalysis of the underlying structure. Then estimation of the 
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2 K. NAITO 

density function is replaced by estimation of the unknown parameter vector 
9. Finally, we obtain a density estimator 

f(x)=g(x,§), 

where 6 is an estimator. This approach is called the plug-in parametric 
approach and is justified only when the true / is exactly as in the model or 
at least in the neighborhood of the model. 

The other approach is nonparametric. Several methods for nonparametric 
density estimation have been proposed and investigated. Izenman (1991) 
summarized a number of these methods. A representative method is the 
traditional kernel density estimator of /, 

1 n 
(1-1) f{x) = -Y,K h {X i -x), 

l=\ 

where Ki l (z) = h~ 1 K(h~ 1 z), K(-) is some chosen density which is symmetric 
about zero, and h is the bandwidth. The basic properties of / are well known 
and under smoothness conditions we have 

Ef(x) = f(x) + y H2,Kf"(x) + 0(A"), 

nn n yn, 

where m q = J z G(z) dz and R(G) = J G(z) 2 dz for some kernel function G 
[cf. Simonoff (1996) and Wand and Jones (1995)]. The traditional kernel es- 
timator is by construction completely nonparametric in the sense that it has 
no preferences and works reasonably well for almost all shapes of densities. 
Like the kernel estimator, all nonparametric methods can be used without 
the structural assumption that the underlying structure is controlled or cap- 
tured by a finite-dimensional parameter. Thus, nonparametric approaches 
have attractive flexibility; however, the parametric model is difficult to dis- 
count because a well-estimated structure by the parametric approach is easy 
to understand. 

This motivates us to propose an approach which includes both the para- 
metric approach and the nonparametric approach. We propose and inves- 
tigate a class of semiparametric density estimators which have precision 
comparable to, and sometime better than, that of /. One class considered 
herein is the set of density estimators derived from the local L2-fitting crite- 
rion with index a. In the proposed approach, the parametric plug-in density 
estimator g(x,6) is utilized, but it is seen as a crude guess of f(x). This 
initial parametric approximation is adjusted via multiplication by an ad- 
justment factor £ = £(#). That is, the initial approximation is adjusted via 
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the form g(x,9)^. The local fitting approach is used to determine the ad- 
justment factor. Throughout the present article, £ = £(x) is determined by 
minimization of the empirical version of the function 

(1.3) Q{x,£\a)= I K h (t-x) — j- dt 

for a fixed target point x. This method is called the local L2-fitting crite- 
rion, where a is a real number called the index. Observe that local fitting 
is obtained using the kernel function K. The symmetric density K creates 
the fitting locally around the target point x. This local approach is based 
on the simple intuition that observed data which are far from the target 
point x do not have information about the adjustment. The minimizer of 
the empirical version of (1.3) is our objective and is denoted by £ = £(x). 
Using this £, we finally obtain a density estimator f{x) = g(x,0)£(x). This 
approach is shown to be effective and yields a theoretically good estima- 
tor in the sense of mean integrated squared error (MISE). A similar but 
somewhat different approach was proposed by Copas (1995) in conjunction 
with the likelihood method under censoring. Eguchi and Copas (1998) also 
discussed a class of local likelihood methods and developed asymptotics un- 
der a large bandwidth h. Their approach is the local estimation of 9 in the 
model g(x, 9) and the adjustment factor £ does not appear. The present 
approach is the local estimation of £ using a previously obtained plug-in 
parametric estimator g(x,9). 

This multiplicative approach is closely related to studies performed by 
Hjort and Glad (1995) and Hjort and Jones (1996). Hjort and Glad (1995) 
proposed a density estimator based on the naive estimator of £. In addition, 
Hjort and Jones (1996) suggested and investigated two versions of multi- 
plicative density estimators. One class of density estimators considered here 
includes these estimators as special cases, so this article may be seen as a 
generalization of these previous works. 

The class of density estimators is developed in Section 2, and the esti- 
mators proposed by Hjort and Glad (1995) and Hjort and Jones (1996) are 
reviewed through examples. The behavior of the present estimators is in- 
vestigated in Section 3, which also reveals that the present result is indeed 
a generalization of the results of Hjort and Glad (1995). The variance of 
the present estimator is the same as that of the traditional kernel estima- 
tor /, but the structure of the bias has a different form that depends on 
the initial parametric approximation. As an important property, we confirm 
that if / is in the model, the estimator has reduced bias. Approximate or 
asymptotic MISE (AMISE) is derived in Section 4. Furthermore, the best 
estimator in the class is determined from the simple result that the bias is 
linear in a. In Section 5 we compare the present estimator with / for the 
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case in which / belongs to a class of normal mixture densities. In partic- 
ular, a comparison is performed for 15 different test densities proposed by 
Marron and Wand (1992). In addition, a similar comparison for the case in 
which / is the skew-normal distribution proposed by Azzalini (1985) is also 
discussed in Section 5. In Section 6 a simple algorithm to choose the best 
a is proposed. This algorithm is a variant of that used by Hjort and Glad 
(1995). Furthermore, two methods of data-based selection of a are discussed, 
and theoretical results and the practical algorithm are documented. These 
methods are constructed by reference to the theory of estimating the den- 
sity functional discussed by Hall and Marron (1987) and Wand and Jones 
[(1995), Section 3.5]. Finite sample performance of the proposed estima- 
tors, and comparison to the /, the Hjort and Glad and the Hjort and Jones 
estimators are investigated by Monte Carlo simulation in Section 7. Supple- 
mentary remarks are presented in Section 8. It is trivial that the integral 
of the estimator is not unity, but the expansion formula as h tends to zero 
shows that it is 1 + 0(h 4 ) provided that we adopt a Gaussian density as an 
initial parametric model. A practical expression of the proposed estimator 
under the case using a Gaussian kernel and model is presented. Proofs of 
the theoretical results are presented in Section 9. 

2. Local /^-fitting criterion. This section is devoted to the construc- 
tion of the present density estimator. First, we prepare a plug-in parametric 
density estimator g(x,9), where 9 is an estimator of the least false value 9q 
according to a certain distance measure between / and g(-,9). The maximum 
likelihood estimator is a representative candidate for 6 in which the distance 
measure is known as the Kullback-Leibler distance / f(x) log{f(x)/g(x, 6)} dx 
and #o is defined as the minimizer of the Kullback-Leibler distance on 9. 
This parametric estimator is seen as a crude guess of /. Next, we aim to 
adjust this initial approximation by the form g(x,9)^, where £ = £(x) is 
the adjustment factor. The problem is determination of £. To explain this 
method more clearly and to introduce the approaches proposed by Hjort and 
Glad (1995) and Hjort and Jones (1996), we present three examples below. 
Note that the kernel function K is a symmetric density and the notation 
utilized in (1.1) and (1.2) is used throughout. 

Example 1 (Hjort and Jones estimator). To determine the adjustment 
factor £, Hjort and Jones (1996) suggested that the function of £ is 

q(x, = j K h (t- x){f{t) - g(t, 9)£} 2 dt. 

The optimal £ is determined by minimization of the estimate of q(x,£) on 
£. That is, we seek to minimize 



q n (x,0=f Kh(t-x)g(t,9) 2 dt-^Y, K h(Xi-x)g(X. 
J n f— f 
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which gives 

t = £m = axgxmn q n (x,t ) = - 

The density estimator is obtained by 

n- 1 5% zl K h (X i -x)g(X i> 0) 



(2.1) / H j(x)= 5 (x,0)e(x)= 5 (x ; 



/K h (t-x) 5 (t,^) 2 (it 



Although not fully discussed, this /hj is the resultant estimator suggested 
by Hjort and Jones [(1996), page 1636]. 

Example 2 (Local likelihood estimator). The factor £ is determined by 
minimizing the empirical form of 



9{t,9)€ 



dt. 



e( x ,o = jK h (t-x) 

which is equivalent to maximizing that of 

L(x, = J K h (t - x){f(t) log{g(t, 9)0 - g(t, 9)0 dt. 

The term l(x,£) can be seen as a local version of the Kullback-Leibler 
distance from f(x) to g(x,9)^. The resultant adjustment factor is 

i=i( x ) = hx) * 

fK h (t-x)g(t,9)dt 

and the ensuing estimator is 

faix) = g(x,0)£(x) 



(2.2) 



9{%,9)—— s , £ S , = f( x ) 



jK h (t-x)g(t,e)dt jK h (t-x)g(t,6)dt 



where / is as in (1.1). This /ll was proposed by Hjort and Jones [(1996), 
page 1635], who derived and discussed several estimators; /hj and /ll are 
two special estimators with respect to the multiplicative adjustment scheme. 

Example 3 (Hjort and Glad estimator). If we may assume f(x) = 
g(x,0)£, then true adjustment is £ = f(x)/g(x,9). Hjort and Glad (1995) 
proposed the naive estimator 

1 " K h {Xi - x) 



|(z) = (/(z)/ 5 M)) = -£. 

v ' n l= i g[Xi,{ 
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which gives 

(2-3) fno( X )= 9 ( X ,e) 1 -± Kh [f- x) . 

In Hjort and Glad (1995) the behavior of /hg was investigated and was 
shown to be better than the traditional kernel estimator in the sense of 
MISE on a certain class of normal mixture densities. 

In the present article we are concerned with a function, namely (1.3), 
in conjunction with Examples 1-3. Considering the empirical version of 
Q(x,£\a) gives, by omitting the irrelevant term, the objective function 

r if n 

Qn(x,£\a)=e K h {t-x)g{t,e) 2 - a dt-^Y, K h(^-x)9{^0) l ~ a - 

J Tl -. 

i=l 

Obviously, a = gives q n (x,^), so Q n (x,£\a) is a generalization of q n (x,£) 
in Example 1 and has weight function g(t, 6)~ a . The minimizer can be easily 
determined as 



£ = £(#) = argminQ n (x,£|Q!) 



n- 1 T2=iK h (X i -x)g(X i ,0) 1 - a 



t fK h (t-x) g (t,ey-*dt 

which is the proposed adjustment factor. Since the estimator depends on a, 
by adding the symbol a we have 



(2.4) f a (x)=g(x,9)ax)=g(x, 



n- x YA=iK h (Xi- x)g(X i ,9) 1 - a 



fK h (t-x)g(t,0) 2 -<*dt 

From (2.1)-(2.4), the following relationships hold: 

fo( x ) = fnj{x), fi{x) = fhh{x), fzix) = fe G (x). 

The case a = is trivial. The case a = 1 is confirmed by noting the defini- 
tion £(x, £) and the Taylor expansion of (1 + y) log(l + y) at y = 0. This is 
also noted in Hjort and Jones (1996). The equality ji = /hg claims that the 
naive estimator £ proposed by Hjort and Glad (1995) is characterized by the 
minimizer of Q n (x,£\2). Therefore the estimators determined in Examples 
1-3 are connected by a. We thus propose a class of density estimators using 
a as the index. As described in the following sections, the introduction of 
a is essential and enables us to progress toward the theory of optimality in 
density estimation by the multiplicative adjustment scheme. In the following 
sections we discuss the behavior of estimators in this class. In addition, the 
best estimator in this class is determined. 
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3. Asymptotic theory. In this section, we investigate various statistically 
important quantities about f a , such as bias and variance. From the features 
of f a , it is trivial that its behavior depends on that of 9 included in the 
initial parametric approximation g(x,9). To proceed with the theoretical 
study, we allow a somewhat more general setting for the choice of estimator 
9. Let F be the true distribution function, the cumulative of /, and let F n 
be the empirical distribution function. We consider functional estimators of 
9 of the form 9 = T{F n ) for some smooth functional T having the influence 
function 

I(x) = lim[T((l - e)F + eS x ) - T(F)]/e, 

£— >0 

where 5 X is the unit point mass at x, and assume that Sj = Ef[I(Xi)I(Xi) ] 
is finite. The best parametric approximation go(x) = g{x, 9q) to f(x) that g(x, 9) 
aims for is determined by 9q = T(F). It is well known for the case of the 
maximum likelihood estimator that T(F) is defined as the solution of the 
equation f (d/d9) log g(x, 9) dF(x) =0, and so I(x) = J~ l (d/ 89) log g(x,9o), 
where J = -E f [[d 2 /d9 d9 T )logg(Xi,9 )]. We may refer to Serfling (1980) 
for such a functional estimator. Under regularity conditions [see, e.g., Shao 
(1991)] we have 

i n / 

(3.1) 9 = 9 Q + -Y J I(Xi) + - + e n , 

n . • n 

where e n = O p (l/n) with mean 0(l/n 2 ). Then we have the following theo- 
rem. 

Theorem 1. Let go(x) = g(x, 9q), with 9q = T(F), be the best parametric 
approximation to f . Then, as n — > oo, h — ► 0, 



h 2 
Bias/ Q (x) = y^ 2 ' K 



( 5 o(x) 1 - Q /(x))" f(x)(g (x) 2 -«yn 



goix) 1 "* 9o(x) 2 - a 



U 4 + - + 



n n 2 



nh n \n n' 



Var/ Q (z) = ^/(z)-^ + 0(-+, 



The proof is included in Section 9. Note that the leading term of the 
variance of f a is independent of the estimation of 9 and, with reference 
to (1.2), it is the same as that of /. Consistency of the density estimator 
requires both h — > and nh — > oo. The optimal size of h is proportional to 
ra -1 ' 5 , which is also the same as that for /. Furthermore, it is worth noting 
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that if / is in the model {g(x,6) :9 e 0}, that is, go(x) = f{x), then the 
0(h?) term of the bias vanishes. 

From the above observations, the essential difference between the behavior 
of f a and that of / appears in the bias. As seen in the next section, the 
0(h 2 ) term of the bias of f a has a nice expression (4.5), which allows the 
best estimator in the sense of MISE to be determined. 

4. Goodness of estimators. In this section, the goodness of estimators 
is evaluated in the sense of MISE. In addition, f a and / are compared. 
Let 11(f) denote the integral of the squared 0(h 2 ) term of the bias of a 
density estimator /. From Theorem 1 and (1.2), the AMISE of f a and / 
are, respectively, given by 



and 

where 

(4.1) 1Z(f, 



AMISE(/ a ) = ^iiIkK(L) + ^ 



AMISE(/) = ^|^(/) + ^, 
(g (x) 1 - a f(x))" f(x)(g (x) 2 ~ a )"^ 2 



9o(x)^ a go(x) 2 ~ a 

(4.2) 1Z(f)=J{f"(x)} 2 dx. 



dx, 



So it suffices to compare 1Z(f a ) and 1Z(f) in the AMISE comparison, pro- 
vided that we use the same kernel function K. The AMISE comparison will 
be discussed for special choices of the underlying /, using the same kernel. 
Now we consider the function in the bracket in (4.1) to discover the best 
estimator. Let us define 

(4.3) 6l ( x ) = / "( s )_/( x )S^ 

I 9o(x) \9o(x)J 

Then it is easily verified that 

, A ~ ( 90 (x) 1 ~ a f(x))" f(x)(g (x) 2 - a )" , x,., u hf , 

( 4 - 5 TTIZ^ TTa^ = {b 1 (x) + b 2 (x)}-ab 2 (x). 

9o(x) 1 a go(x) 2 a 

That is, the 0(h 2 ) term of the bias of f a is linear in a. Therefore, writing 
(4.6) ci= f{b 2 (x)} 2 dx, 
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(4.7) c 2 =Jb 2 (x){b 1 (x)+b 2 (x)}dx, 

(4.8) c 3 = J{h(x)+b 2 (x)} 2 dx, 

we obtain 

(4.9) K{f a ) = c 1 a 2 -2c 2 a + c 3 . 

Using (4.9), we have the leading terms of the integrated squared bias of 
Ihj, /ll and /hg by substituting a = 0, 1 and 2, respectively. For instance, 
C3 = IZ(fo) is found to be the integrated squared bias of /hj- The quadratic 
expression of (4.9) establishes the following proposition. 

Proposition 1. lZ(f a ) is minimized over a at 

(4.10) a = - 
and its minimum value is 

(4.11) minK(f a ) = c 3 -^-, 

where c\-c 3 are given in (4.6)-(4.8), respectively. 

The linear structure (4.5) is essential in the derivation of Proposition 
1. This is obtained by introducing a through the weighting g(t,6)~ a in 
Q(x,£\a), so that such a generalization indeed has an advantage. Theo- 
retically, the ideal estimator f ao is the best estimator in the class which 
surpasses estimators /hj, /ll and /hg hi the sense of AMISE. 

5. Asymptotic comparison. In this section the proposed f a is compared 
to / based on the AMISE formulas described in Section 4. 

5.1. Comparison in normal mixture. Here we compare f a and / for the 
case in which / belongs to the class of normal mixture densities. Let 

k 

f( x ) = ^2pifi(x), 

where 

fi(x) = —4>[ ) = 4> ai {x - m), 

(ft is the standard normal density function and J2i=iPi = 1- The family of 
such mixtures forms a very wide and flexible class of densities. Marron and 
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Wand (1992) studied such mixtures and singled out 15 different densities 
which are often used as test densities in the study of the performance of 
density estimators [Hjort and Glad (1995), Jones and Signorini (1997) and 
Jones, Linton and Nielsen (1995)]. It is easy to see that 



Mo = / xf(x) dx = ^phh, 

i=l 

°o = J ( x - ^ff( x ) dx = ^Pi{af + (m - no) 2 }. 

i=i 

For the present estimator f a , we adopt here the normal density (f>a (x — 
Ho) as go(x) = g(x,9o). This corresponds to the use of maximum likelihood 
estimates (MLE) for estimation of #o> since the normal density that has mean 
Ho and variance <7q minimizes the Kullback-Leibler distance from f(x) to 
g(x,0) =4> a (x- fi), where 9 = (h,ct 2 ) and 6 = (Mo,o"o)- 

The previous section indicates that the AMISE comparison is performed 
by comparing 1Z(f a ) and 11(f). Both can be calculated through (4.1) and 
(4.2) using numerical integration. However, when / is a normal mixture and 
go is normal, we obtain the analytic expression of1Z(f a ) by obtaining those 
of ci, c 2 and C3. Referring to (4.3) and (4.4), direct computation yields 

bl (x)=± P J t (x){^H 2 (^-\HJ X -^-)}, 
~{ I erf V <?i J o"o V °o / > 

fr[ VcTQCTi \ (Jo J \ cri J CJg V cr J) 

where H^ is the kth order Hermite polynomial. Since c\, c 2 and C3 are 
all integrals of these functions, we find their analytic expressions using the 
properties of the Hermite polynomials. The detailed calculations are found in 
Naito [(1998), Sections 4 and 6]. On the other hand, the expression of 1Z(f) 
has already been presented in Marron and Wand (1992). Thus, by using (4.1) 
and (4.2), we compare f a and / for 15 representative test densities used in 
Marron and Wand (1992). The values of the ratio 1Z(f a )/1Z(f) for a = 0, 1, 2, 
a are tabulated in Table 1, in which the case number corresponds to that 
used in Marron and Wand (1992). The entries in column a are the values 
of a for each case. Since #1 is normal, 1Z(f a ) = for all a, so that the ratio 
is always zero in the #1 row. For example, in #6, which corresponds to a 
bimodal density the value of 1Z(f )/1l(f) is 1.7434 and that of 1Z(f 2 )/1Z(f) 
is 0.7705, and for #6, the minimum of the ratio is attained at a = 1.9394 
and its minimum value is 0.7696. 
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Table 1 
Comparison in normal mixture 



f 


a = 


a = 1 


a = 2 


a = a 


a 


#1 


0.0000 


0.0000 


0.0000 


0.0000 


— 


#2 


1.0448 


0.3947 


0.2460 


0.2356 


1.7968 


#3 


1.0239 


0.9986 


0.9925 


0.9922 


1.8207 


#4 


1.0010 


0.9799 


0.9606 


0.8719 


11.7075 


#5 


1.0436 


0.8826 


0.7822 


0.7414 


3.1606 


#6 


1.7434 


0.9980 


0.7705 


0.7696 


1.9394 


#7 


1.4821 


0.9829 


0.8524 


0.8485 


1.8541 


#8 


1.5398 


1.0114 


0.9007 


0.8892 


1.7651 


#9 


1.3088 


1.0010 


0.9178 


0.9159 


1.8706 


#10 


1.0512 


0.9947 


0.9791 


0.9788 


1.8787 


#11 


1.0003 


1.0000 


0.9999 


0.9999 


1.8597 


#12 


1.0236 


1.0036 


1.0025 


1.0007 


1.5589 


#13 


1.0005 


1.0000 


0.9999 


0.9999 


1.7840 


#14 


1.0030 


1.0004 


1.0002 


1.0000 


1.5897 


#15 


1.0127 


1.0013 


1.0001 


0.9994 


1.6190 



a Values of the ratio lZ(f a )/7Z(f) are tabulated for the 15 den- 
sities in Marron and Wand (1992). Values of the optimal index 
a defined in (4.10) are listed in the a column for each case. 



We can confirm that Proposition 1 holds and f ao is better than, or at 
least competitive with, / for all cases in this comparison. Furthermore, it is 
worth noting that a is around 2, except for $4 and #5. This reveals that 
the Hjort and Glad estimator /hg — ji is also good for almost all cases. 

5.2. Comparison in skew-normal. Similar to the previous section, the 
comparison of f a and / is performed for the case in which / belongs to 
a class of skew-normal distributions discussed in Azzalini (1985). If a ran- 
dom variable X has density f(x) = 2(ft(x)<&(\x), where $ is the distribution 
function of the standard normal, then we say that X has skew-normal dis- 
tribution with parameter A and we denote this by X ~ SN(A). Here SN(0) 
corresponds to the standard normal. We obtain from direct calculations that 

(5.1) f(x) = 2<t>(x) Sl (x,\\ f"(x) = 2^(x)s 2 (x,X), 

where 

si(x, A) = \<j)(\x) - Hi(x)$(\x), 

s 2 (x, A) = H 2 {x)<S>{\x) - (A 3 + 2X)H 1 {x)(p{Xx) 

and Hk is the kth. order Hermite polynomial. In addition, we adopt the 
normal density as an initial approximation and the MLE for estimation of 
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the parameter included in the parametric model. We have for X ~ SN(A), 

^2 A 



Mo= / xf{x)dx 



^vTTA2' 



Co = / ( x - Vo) 2 f( x ) dx = l 



2A 2 



^(1 + A 2 )' 

which gives the least false parameter vector 6 = (/i , ctq) for g (x) = 4> aQ (x — ho)- 
To find the best estimator, it is required to obtain &i(x) and b2(x) in (4.3) and (4.4), 
respectively. Direct computations yield 



hix) = 2(p(x) 



S2 ( X|A )_J_# 2 (£_Jf°) 



b 2 (x) = -A<j)(x) 



— si(x,X)Hi 
o"o 



V o"o 

x- no 

o"o 



ok 



x- Ho 
o"o 



$(Ax) 



Using these, we can obtain 1Z(f a ), and we have from (4.2) and (5.1) that 

K(f)= [{20(x)s 2 (x,\)} 2 dx. 



Table 2 exhibits the comparison for A = 0(1)5. For each A the ratio H(f a )/ 
7Z(f) is tabulated. Since A = implies f = go, the ratios are zero for all a. 
For any A utilized in this comparison, we observe f a for a = 1,2, a are all 
superior to /. 

Table 2 
Comparison in skew-normal 



f 


a = 


a = 1 


a = 2 


a = ol 


c\„ 


A = 


0.0000 


0.0000 


0.0000 


0.0000 


— 


A = l 


0.0762 


0.0232 


0.0134 


0.0118 


1.7270 


A = 2 


0.7636 


0.2669 


0.1645 


0.1531 


1.7594 


A = 3 


1.4625 


0.5783 


0.3945 


0.3748 


1.7624 


A = 4 


1.7888 


0.7836 


0.5839 


0.5583 


1.7480 


A = 5 


1.8678 


0.8963 


0.7133 


0.6850 


1.7320 



a Values of the ratio TZ{f a )/TZ{f) are tabulated for A = 0(1)5 in SN(A) 
proposed by Azzalini (1985). Values of the optimal index a defined 
in (4.10) are listed in the a column for each SN(A). 



6. Index selection. In this section, three data-based methods used to 
select the index a are discussed. These methods are somewhat intuitive, 
but the density estimators with the index obtained through these methods 
perform well, as shown in the simulation report in Section 7. 
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6.1. Direct method. We propose a data-based selection of a which is a 
derivative of that of h discussed in Hjort and Glad [(1995), Section 6]. We 
consider the Hermite expansion given as 



(6.1) 



f(x) = <j> 






a 



where 70 = 1 and 71 = 72 = 0. We know that 7^ = E[H^((X — fi)/a)]. Sim- 
ple but somewhat tedious computations, along with the Gaussian initial 
approximation go(x) = 4> a {x — fi) and m = 5, yield 



(6.2) Cl 

(6.3) c 2 

(6.4) c 3 



1 



<7 5 y / 7r 
1 

1 



16/ 9 V32/ 144 V 64 



2 ,3\ Ji(32\ 7|_/195 

31 4J 9 \57J 144 V 32 . 

; ,3\ t|/123V J_/225 

731 2] 9 V 32 J 144 V 16 



_ 7375 /21 
6 V32 

7375 /39\ 
6 \32 / 



7375 



Here c,, i = 1,2,3, are estimated in the usual manner by substituting 

for 7^, where k = 3, 4, 5, and by substituting <r for o\ The next step is to use 
nonparametric estimators of c\ and c 2 defined by 

c\(h) = {b 2 (x;h)} 2 dx, 



c 2 (h) 



where 



bi(x;h) = - V 
n r— ; 



i=l 



P-" 



b 2 (x; h){bi(x; h) + 6 2 (^; h)} dx, 
x-Xi 



h 



l K / x-X i \ g"{x,9) 
h V h J g( X ,§) 



b 2 (x;h) = - V 
n r— f 



i=l 



1 l ( x-X i \ g , (x,9) \ ( x-Xj 
h 2 V h ] g(x,9) h \ h 



g'(x,9) 

g(x,9) 



K is a kernel, which may be different from that used in f a , and h is the 
bandwidth. Using these quantities, we choose a as follows. First, we obtain 
Q, i = 1,2,3, from (6.2)-(6.4), respectively, using 7^, k = 3,4,5, and a, under 
the assumption that the underlying distribution is approximated by the 
Hermite expansion. Then, referring to (4.11), TZ(f ao ) is estimated as 

(c 2 ) 2 



nfao)=C3 



C\ 
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This gives a bandwidth 



(6.5) h={^l\ 1/5 nf ao r 1/5 n^, 



1/5 

from which we have an estimate of the optimal index, 
(6.6) 4 11 - d2Ch) 



' o 



ci(h) 



6.2. Two methods based on functional estimation. Here we propose two 
methods based on estimation of the functional of / and g(x,6). Define 

9o( x ) 



(6.7) qi (x) 



9o(x)' 



(6-8) q2{x) = 9M = q ' l{x) + {qi{x)} 2 j 

go(x) 

where go(x) = g(x,6o). Using this notation, we have 

ci = 4 J f'(x) 2 qi (x) 2 dx + 4 J f(xfq x {xf dx - 8 J f(x)f(x) qi (xf dx, 

c 2 = ci + 2 / f{x)f"{x)q 1 {x) dx - 2 / f(x)f'(x)qi(x)q 2 (x) dx 

-2 J f(x)f"{x) qi (xfdx + 2 J f(xf qi {x) 2 q2 (x)dx. 
Under the sufficient smoothness condition for /, it follows that 
f'(x) 2 qi (x) 2 dx 

= -E f [f"(X) qi (X) 2 ] - 2E f [f'(X) qi (X)q 2 (X)] + 2E f [f'(X) qi (X) 3 ] 
and 

J f'{x)f"{x) qi {x)dx 

= -E f [f"{X) qi {X)]-E f [f"{X) q2 {X)] + E f [f"{X) qi {X) 2 ]. 

These calculations allow us to define 

i;(p\r, S ) = E f [f^(X) qi (Xy q2 (Xy] 

for integers p = 0, 1, 2, 3, r = 0, 1, 2, 3, 4 and s = 0, 1, 2, where f& (x) = (d p /dx p )f{x) 
and f(°'(x) = f(x). Then we have 

ci =4{^(0|4,0) - V(2|2,0) -2</>(l|l,l)}, 

c 2 = Cl + 2{^(0|2, 1) - V(3|l,0) - V(2]0, 1) - </>(l|l, 1)}, 
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so that the optimal a in (4.10) can be written in terms of ip as 

-0(O|2, 1) - V(3|l,0) - V(2|0, 1) - V(l|l, 1) 



a = — = l + - 
c± 2 



^(0|4,0)-^(2|2,0)-2^(1|1,1) 



1 + 



2£>' 



where 



AA = ^(0|2, 1) - -0(3|1,O) - V(2|0, 1) - ^(1|1, 1), 

2? = ^(0|4,0)-^(2|2,0) -2^(1|1,1). 

By the above reductions, data-based selection of a is accomplished by using 
an estimator of a defined by 

a (<?) = l + -^ 



where 



1 

1 + 2 



i> g (p\r,s) 



^ g (0|2, 1) - V> g (3|l,0) - V> g (2|0, 1) - ^ g (l|l, 1) 
4(0|4,0)-^(2|2,0)-2Vi g (l|l,l) 

n(n — 1) ^ 



»^j 



is a nonparametric estimator of ip(p\r,s) that has a symmetric kernel L 
and bandwidth g that are possibly different from K and /i, respectively. In 
addition, q\ and 52 are, respectively, those of (6.7) and (6.8) using g(x, 9) 
rather than go(x). 

The behavior of a (g) can be investigated by a method based on the 
theory of estimating the density functional [see, e.g., Section 3.5 in Wand 
and Jones (1995)]. Mean squared error (MSE) is adopted to evaluate M g 
and T> g , while a {g) is evaluated by mean squared relative error (MSRE). 
Somewhat tedious calculations yield the following theorem: 



Theorem 2. As n — > 00 and g — ► 0, 



(6.9) 



(6.10) 



MSE[A/y = 9 ^l L M[2f + -L^ 



A 2 |3(x,z) cfecfe 



2 -5> 



MSE[£> 9 



+ 0(n i )+o(5 4 + n z 5- 

9 -^ 2L V[2} 2 + ,^ f J k 2 {x,z) 2 dxdz 



+ 0{n- 1 )+o{g 4 + n- 2 g 



2„-5> 
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MSRE[a (<7)] = 9 -a\ h 
lb 



(6.11) 



where 



+ 



1 



N\2\ P[2]1 2 

a7[o] ~ v[o]_ 

>>2\3(X,Z) K 2 (x,z) 1 ' 



8n 2 g* 



N[0] 



V[0] 



dxdz 



+ Oin- 1 ) + o{g A + n" V 5 + n~ l ) 



\ P2 \ Pl (x,z)=f(x)[{2L^\z)+zL^\z)}q 2 (x) - zL^\z) qi (x)\ 

K p2 (x,z)=f(x)[2L^(z) qi (x) 2 ] 
for even p 2 and p\ = p 2 + 1, and 

N\p] = -0012, 1) - ^(p + 3)1, 0) - tP(p + 2|0, 1) - V(p + 1|1, 1) 

D[p]=V(p|4,0)-V(p + 2|2,0)-2^(p + l|l,l) 
/or even p with N[0] =N and T>[0] =T>. 

The proof of Theorem 2 is presented in Section 9. From Theorem 2 the 
approximate mean squared error (AMSE)-optimal bandwidths for N g and 
T> g , and the approximate mean squared relative error (AMSRE)-optimal 
bandwidth for a {g) are, respectively, given as 

'SUfX^zfdxdz]^ 



SV-AMSE 



fi'D-AMSE 



A L m\ 



5\ffn 2 (x,z) 2 dx dz 



n 



-2/9 



1/9 



n 



-2/9 



and 



5AMSRE 



2) i4 tI V{2} 

5\ JJ{VX 2]3 {x, z) - Nk 2 (x, z)} 2 dxdzl V9 



V 



n 



-2/9 



nl L {vN[2]-Nv[2]Y 

Unfortunately, these bandwidths have the same defect as the plug-in method 
for bandwidth selection of the kernel density estimator: all of these band- 
widths depend on unknown A/"[2], T> [2], N and T>. Estimation of N[2] and 
T>[2] is possible; however, their optimal bandwidths depend on A/"[4] and 
T> [4]. Furthermore, it can easily be recognized that this problem does not go 
away. 

To overcome this problem, we utilize a simple estimate based on the 
Hermite expansion of (6.1). Equation (6.1) yields a pilot estimate of f^'ix) 
as 

' -1)P Jx-jl 



f<*\x) 



aP+ l 



a 



E 
fc=i 



-H k+p \—- 
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from which we have M[6] = ^(6|2, 1) - ^(9|1,0) - ^(8|0, 1) - $(7|1, 1) as an 
estimate of A/"[6] using the component defined by 

i>(p\r,s) 



("1) P v- Ik 



Ell 



aP k% k[ 



n r-f a 



H. 



k+p 



Xi- JM 



a 



q 1 (X l ) r q 2 (X l ) 



An estimate T>[6] of T>[6] can be obtained in the same manner. 

In the following text we describe the algorithm used to obtain two esti- 
mates of a . The notation utilized is 

L [ \Pl,P2) = M2,£(Pl)£(Pl) + 4 / i 0,L(P2)L(P2) +4^ ljL ( Pl ) L (p 2 )> 

L [ \Pi,P2) = 4n 1>L ( P1 ) L ( P2 ) +2p 2 ^ L(Pl)L ( Pl) , 

L [ (pi,P2) = 4/i 0iL (p 2 ) L ( P2 ) +2/i liL ( Pl ) L ( P2 ) 
for nonnegative integers p\ and p 2 , and 

Xl 2lpi ((3)=L^H Pl ,p 2 )M0\0,2)-L^( Pl ,p 2 )4, p (0\2,l) 

+ ^ 2 ,L(ra)L(P2)V'/3(0|4,0), 

^ 2 (^)=4/i ,L(P2)L( P2 )V'/3'(0|4,0) 

for bandwidths (3 and f3' . Detailed calculations needed to derive some of 
equations in the sequel are omitted, but are available from the author. 

1. Compute j\T[6] and V [6]. 

2. Compute Agi 7 (/3 n i) and Kg(^i) f° r some appropriately chosen band- 
widths /3 n i and /^i, and then compute 



9nl 



9di 



13\ A 6|7(^l) 

13 \ K§(fla) ■ 



1/17 



» 



2 / AiPW 



1/17 



H 



-2/17 



-2/17 



3. Compute A?i 5 (/3 n 2) and £4(^2) for some appropriately chosen band- 



M|5V 

widths (3 n 2 and (3d2, and then compute 

-9\ AL(/3 n2 ) !i/i3 



5n2 



5d2 



9\ «!(^ 2 ) 



/; 



2 ^1l^[4] 2 J 



1/13 



V 



-2/13 



-2/13 
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4. Compute A^>Jj3 n %) and fi^CAffi) f° r some appropriately chosen band- 



widths P n 3 and Pd3, and then compute 





9n3 — 


V5\ A\a(M - 


1/9 




A2//4 L A/; n2 [2p. 






9d3 = 


\(5\ «l(/3«e) 1 


1/9 
n 


5. Compute 

* 


5 


5AMSRE 


Wti^JfgJft-ftgJDgaW* 




X 


^^(3,2)4,(010,2) 





-2/9 



2/9 



1/9 



" {^L L[21 ( 3 > 2 ) +2AG«3^^ [3I (3,2)}^ 



12,1) 



±MgA<alhj,mLV> A (°l 4 > °) 



1/9 



x n 



-2/9 



for some appropriately chosen bandwidth /3q- 
6. Compute two estimates of a defined as 



(6.12) 
and 



o 



P] 



&o(g. 



AMSRE 



(6.13) 



«[3] =1 + I A/ ^3 



2p 



.</.« 



aP) 



Here aV is based on AMSRE formula (6.11), so that a single bandwidth is 

f3l 

included. On the other hand, the two bandwidths included in a are based 
on AMSE formulas (6.9) and (6.10), which correspond to the numerator N 
and the denominator V, respectively. The bandwidths /3 n i, @ n 2, (3 n 3, fldi, 
Pd2, Pd3 and 0o are all determined using the formula 

AMSE [c%(0|0, 2) + 6^(0|2, 1) + c^(0|4, 0)] 

= y^,l{«^(2|0, 2) + hjj(2\2, 1) + c0(2|4, 0)} 2 

+ ^^ J f(xf{aq 2 {x) 2 + b qi (x) 2 q 2 (x) + aft(z) 4 } 2 cfe 
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for some constants a, b and c. This gives the optimal j3 as 

/?AMSE 



2R(L) E f [f{X){aq 2 (X) 2 + b qi {X) 2 q 2 (X) + c qi (X) 4 } 2 ]] X '\-^ 



{aip{2\0, 2) + bi/)(2\2, 1) + ct/>(2|4, 0)} 2 

At this stage, estimates of ip(0\r,s) and ip(2\r,s) for some pairs (r, s) are 
needed. These can be provided by kernel estimates of / and f( 2 ' that have 
bandwidths obtained by the method of Hardle, Marron and Wand (1990). 

The empirical behavior of f a for a = a , &o and a is reported in the 
next section. 

7. Finite sample performance. Finite sample performance of the pro- 
posed density estimators was investigated by Monte Carlo simulation. The 
first 10 densities (#1~#10) of Marron and Wand (1992), which cover a 
large variety of realistic density shapes, were used as target densities in this 
simulation study. In each case 1000 samples of size n = 500 were gener- 
ated. The MISE(/i) value for a given bandwidth h was estimated by the 
average of these 1000 realizations of (integrated squared error) ISE(/i). To 
obtain a precise approximation to the minimum MISE, a grid search of the 
bandwidth was implemented. This was done after an initial screening had 
provided a suitable h interval that contained the minimum. The Gaussian 
kernel was used throughout. The estimators compared in this study were 
/and f a for a = 0, l,2,a ,a [ o ] , A; = 1,2,3 [see (6.6), (6.12) and (6.13)]. We 
utilized g(x, 9) = <fia(x — jX) for all cases, where (/t, a 2 ) is the MLE of (//, a 2 ). 
Values of 10 5 x minMISE are tabulated in Table 3, where the minimum is 
taken over h. Also tabulated in parentheses for all cases and estimators are 
10 5 times the standard error (SE) of the estimates of MISE(/i) using the 
bandwidth at which minMISE is obtained. 

First we see #1. This case is that / is in the parametric model so that 
the 0(h 2 ) term of the bias of f a vanishes, as mentioned in Section 3. There- 
fore, a is not defined and the estimation of a does not have meaning. Thus, 
f a for a = a ,a , k = 1, 2, 3, were not simulated for $d for this reason. For 
#1 all of f a are significantly better than /, and f\ is the best. 

- [21 - [31 

The tabulated values for f a with a = a and a = a in $4 are the 
median of ISE(/i) for a given h rather than MISE(/i), and the values in 
parentheses are robust SEs calculated by substituting median absolute de- 
viations. This is because the values of MISE(/i) of these became huge and 
showed unstable behavior in #4. The instability in $4 can actually be ob- 
served, since even the value of robust median ISE(/i) is somewhat large 
relative to MISE(/i) values of other estimators, and then the value of robust 
SE in parentheses is also large. We can further observe from #5 that f a 

f3l 

with a = a behaves unstably. 

In all cases except #4, #5 and #9, the ideal estimator f ao is the best, 
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Table 3 
The value of estimated min^ MISE(/i) x 10 5 for samples of size n — 500 from each of the 

first 10 Marron and Wand densities over 1000 simulations for f , /o (= /hj), 
/i (— fhh), fi (=/hg), fa a and f a with a = a , k= 1,2,3. The standard error xlO 5 is 

given in parentheses for each case 













u 








f 


/ 


a = 


a = 1 


a = 2 


a = a 


a = a 


~\'2\ 

ol = a a 


a = a 


#1 


172 


67 


62 


63 


— 


— 


— 


— 




(3) 


(2) 


(1) 


(1) 


— 


— 


— 


— 


#2 


254 


243 


196 


190 


182 


227 


288 


218 




(4) 


(4) 


(4) 


(4) 


(4) 


(4) 


(9) 


(6) 


#3 


1,413 


1,406 


1,395 


1,394 


1,394 


1,394 


1,394 


1,395 




(15) 


(15) 


(15) 


(15) 


(15) 


(15) 


(15) 


(15) 


#4 


1,372 


1,296 


1,290 


1.286 


2,734 


1,288 


1,440* 


1,523* 




(16) 


(17) 


(17) 


(17) 


(621) 


(17) 


(195)+ 


(213) 1 " 


#5 


1,735 


1,763 


1,677 


1,641 


1,710 


1,648 


1,641 


289,637 




(32) 


(32) 


(31) 


(30) 


(28) 


(30) 


(30) 


(5,181) 


#6 


244 


272 


243 


234 


234 


258 


234 


235 




(4) 


(4) 


(4) 


(4) 


(4) 


(4) 


(4) 


(4) 


#7 


340 


372 


340 


333 


332 


336 


332 


332 




(• r ») 


(5) 


(5) 


(5) 


(5) 


(5) 


(5) 


(5) 


#8 


323 


361 


328 


324 


321 


341 


321 


324 




(4) 


(5) 


(5) 


(5) 


(5) 


(5) 


(5) 


(5) 


#9 


296 


327 


302 


297 


296 


309 


296 


296 




(4) 


(4) 


(4) 


(4) 


(4) 


(4) 


(4) 


(4) 


#10 


1,126 


1,139 


1,125 


1,124 


1,123 


1,135 


1,124 


1,124 




(10) 


(10) 


(10) 


(10) 


(10) 


(10) 


(10) 


(10) 



Note: The asterisk (*) designates the minimum of median ISE and the dagger (f) 
denotes robust SE using median absolute deviation. 



which justifies the theory presented in Section 4. In #4 and #5 f ao is not so 
good because the value a is large relative to the other cases as seen in Table 
1. It seems that a larger sample is needed for ^4 and #5 to confirm the 
theory presented in Section 4. In addition, good performance of f ao reveals 
that the estimation of a is indeed an important problem. Estimators f a for 
a = a , k = 1,2,3, behave well and their differences are small in almost all 
cases. For a = a , k = 2, 3, however, f a were somewhat unstable relative 
to a in the sense of the SE, but the bias of these estimators was smaller 
than that for a ■ 

Some notable insights from Table 3 are as follows. For almost all cases, fa sur- 
passes f a for a = 0, 1. Although the degree of improvement is marginal, use 
of the estimator of a yields better performance, which is recognized in #3, 

|2l 

#5, #7, #8, #9 and #10. For practical situations, the choices of do and 
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a are recommended for densities that are somewhat smooth, but a is 
suited for densities that are rather kurtotic. 

8. Supplements. In this section a number of supplementary results are 
presented. 



8.1. The integral. Direct calculation yields 

{g{XJr~^Y {g(xj?-°y 



h 2 1 n 

f a (x) dx = l + — H2,K ~ Yl 



11 



i=l 



g(Xi,e) 



a-l 



g(Xi,9) 



2-a 



+ Q(h 4 ) 



as h — ► 0. In particular, when we adopt the Gaussian density g(x,9) = 
4>a{x — jX) = 4> s {x — X) as an initial parametric start, where X and s 2 are, 
respectively, the sample mean and the sample variance, we have 

2 



f a {x) dx = l + —V2,K ("^2— ) - Y, 



Xi-X\ 



J -l\ + 0{h q 



l + 0{h 4 



as h — > 0. 



8.2. Computational remark. The practical expression for f a depends on 
the choices of the kernel K and the initial parametric model g(x,9). Thus, 
the general features required for practical calculation are not pursued here. 
However, derivation of the expression for the case in which the Gaussian 
kernel and model are adopted appears to be useful. Now define that 



1(a) 
for K(t) = (j)(t) and g(t,§)- 

1(a) 



K h (t-x)g(t,6)- a dt 
r(t — fx). Direct calculations give 



[V2^) a a 



a i.Q+1 



^ 



a 2 



a 



h 2 



■ cxp 



a(x — jiy 



2(a 2 - ah 2 ) 



provided that a 2 — ah 2 > 0. Using this notation, we have, for the case of 
Gaussian kernel and model, 



fa(x) 



(V2tt) 



a~3fi.a-2 



nhl(a — 2 

n 

xj^exp 

t=i 

for a 2 - (a - 2)h 2 > 0. 



(x-A) 2 (X t -x) 2 ,(Xi-(i) 2 

- (1 - a" 



2& 2 



2h 2 



2a 2 
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8.3. Choosing the bandwidth. From Section 4 we see that the bandwidth 
h that minimizes the AMISE for f a is 

1/5 



h(a) = {^l) nD-^n 



-l/6„-l/5 

for a fixed a, and the resultant minimum value of the AMISE is 



\{^ K R{Kff/"nU a )^n-^. 

Proposition 1 reveals that we can further reduce this by using a = a in 
(4.10). Thus, the best choice for the bandwidth h is 

h = h(a ) = \^±Y /5 K(f ao )- 1/5 n-^. 

Here, we propose a method to choose h which is a variant of that discussed 
in Hjort and Glad (1995). Recall the analogy presented in Section 6.1, and 
consider h in (6.5) and a in (6.6). Further, we consider a bias-adjusted 
version of TZ(f a ) given as 

n\a,h) = ^-\n(a,h)-^±\, 

n— I { nh° J 

where 

Tl(a,h) = c\(h)a — 2&2(h)a + 03(h) 

and 

h(h) = {bi(x;h) +b 2 (x;h)} dx. 

Here h in (6.5) is seen as an initial bandwidth. Then we calculate the final 
bandwidth as 



^2,K 






The theoretical performance of this h is not pursued here. However, we have 
an empirical suggestion based on application to some artificial data that h 
is not as stable as h. 

9. Proofs. In this section the proofs of theoretical results are presented. 
First, we prepare the following lemma which can be proved by Taylor ex- 
pansion. 

Lemma 1. Let go(x) =g(x,6o) and 

n~ 1 Y:: =1 K h (X i -x)g (X i ) 1 - a 



(9.1) f* a (x)=g (x)- 



jK h (t-x)go{t) 2 ~ a dt 
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Then as n — > oo , h—*0 



2 



Bias fa (x) = -^H%K 



(9o(xy- a f(x))" f(x)(g (x) 2 ~ a ) 



2-a\ll 



9o{x 



1— a 



9o(x) 



2-ct 



+ 0(h 4 



nh n \n 



Proof of Proposition 1. The result is straightforwardly obtained 
from the quadratic expression of 1Z(f a ) in (4.9). □ 

Proof of Theorem 1. Define 

d 
u o( x ) = qq 1o S9(x,0 ), 



U (x) 



2 



■logg(x,6 Q ). 



d9d6 T 
Using Taylor expansions, we can expand f a as 

-6 ) T C n (x)(e-e ) + o p (n~ 1 ), 



fa(x) = r a (x) + (o-e yB n (x) + 

where /* is given as in (9.1), 

1 
n 



B n {x) = -Y j B i (x), 
n r— \ 

1 n 
C n (x) = -Y / C i (x), 



i=l 



B i (x)=K h (X i -x)g (X i ) 



i-a 9o(x) 

Vo{x) 



(1 - a)uo(Xi) -—r]i(x) + u (x) 



d{x) = K h (Xi - x)g (X l 



Vo( x ) 

a-«»(i) 



Vo(x) 



2(l-a)(2-a) 



rn(x)M x i) T + 2 (! - a)u (x)uo(Xi) rj 



w( x ) 

(1 - a){U (Xi) + (1 - a)n (X i )n (X i ) T } 



2(2 -a] 
go(x) 

+ 



u (x)rji(x) T + {U (x) + u (x)uq(x) t } 



2(2 — a) f . . . . . .71 1 . . . . 

770 (xr L 2 



where 



24 





K. NAITO 


rio(x) = J K h (t - 


- x)g (t) 2 ~ a dt, 


m(x)= I K h (t- 


- x)u (t)g {t) 2 - a dt, 



mix) = / K h (t - x){U (t) + (2 - a)uo(t)u (ty }g (t) 



\2-<* 



dt. 



Through (3.1) and the average representations above, we have 



E[(e-9oYB n (x)}=0[ — + ^\, 



h- 



1 



n 



/r 



n- 



E[(9 - 9 y C n (x)(9 - 9 )} =0[ — + — \, 

using the fact that Ii = I(Xi) has mean zero. Since the bias term of /* in 
(9.1) was already given in Lemma 1, the bias expression of f a is confirmed. 
Next we consider variance. The variance of /* was obtained in Lemma 1. 
By using the average representation (3.1), we have, after somewhat lengthy 
calculations, that 



Y3x[{9-9 Q yB n {x)]=0 



+ 



\ n 



n- 



\ZZ): 



Cov[f* a (x),(9-9 ) T B n (x)] = O 



n 



h 2 



+ 



ii 



n- 



from which the necessary variance expression is derived. □ 

Proof of Theorem 2. Direct calculation yields that 

MSRE[a (g)] 



E 



a {g) 



1 



ov 



V 2 

4AA 2 



E 



V{N g -N)-N{V g -V) 



V 2 + V(Vg - V) 



MSE[A/" S ] MSE[P 9 ] 



V 2 



E{{V g -V){M g -M)} 
NV 



N 2 

where O n ,g is a negligible higher-order term. Hence it suffices to show (6.9) 
and (6.10), and to evaluate the cross term E{(V g — V){M g — M)} for check- 
ing (6.11). However, only the proof of (6.9) is presented here since the other 
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equations can be obtained in the same manner. We therefore focus on M g . 
Then it follows that 

mse[tv;] 

= MSE[4(0|2,1)]+MSE[4(3|1,0)] 

+ MSE[t/> 9 (2|0, 1)] + MSE[4(1|1, 1)] 

(9.2) - 2£[/} g (0|2, l)A ff (3|l, 0)] - 2£[/} g (0|2 5 l)£ fl (2|0, 1)] 

- 2£[A ff (0|2, 1)A 9 (1|1, 1)] + 2S[A fl (3|l, 0)/* ff (2|0, 1)] 

+ 2tf[A ff (3|l,0)/* fl (l|l,l)] + 2^[A fl (2|0, 1)A S (1|1,1)], 

where /t ff (p|r, s) = , 9 (p|r, s) — ^(p|r, s). Therefore, the proof is further re- 
duced to evaluation of MSE[ip g (p\r, s)] and E\fi g (pi\n, si)ji g (p2\r2, S2)] for 
nonnegative integer triplets (p\r,s), (pi\ri,s±) and (j^l^i^)- To accomplish 
the proof, the following four lemmas are needed. The proofs of all four lem- 
mas are omitted. Details are available from the author. 
Let us define 

(9.3) r g (p\r,s) = - 7 ^~-J2^( X i) r ^(X i yL g P\x i -X j ). 

Performance of t/j g {p\r, s) is dominated by the performance of ip g (p\r, s). The 
following lemma is concerned with ip*(p\r,s). 

Lemma 2. Let ip g (p\r, s) be as given in (9.3). Then, as n — > oo, g — > 0, 
MSE[r g ( P \r,s)] 

= Bias[t/>*(p\r, s)] 2 + Vax[ip*(p\r, s)] 

= ^IMp + 2|2r, *) 2 + -A^,K0|2r, 2s ) 

+ -f / f(x){w(x)fM(x) + {wf}M(x)} 2 dx-4Eir g (p\r,stf 

ft _J 

+ o(n- l + n- 2 g- 2p - 1 ), 
for p even and 
MSE[r g (p\r,s)} 

g A 

= Y /x 2,i' ( /'(p + 2|r,s) 2 

x 2nVp-i / f{x){{w2 • /}(2) (x) " w{x){w • /}(2) (x)} <** 
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+ 



n 



f(p)/ 



l(p)i 



f(x){w(x)fW(x)-{wf}W( x )ydx-4E[r g (p\r,s)Y 

+ o(n- 1 +n"V 2p+1 ) 
/or p odd, where 

w(x) = qi(x) T q 2 {x) s . 

The notation 

w riS (x) = qi(x) r q 2 {x) s , (j) p (x) = Lf\x) 

is used in the next lemma. 

Lemma 3. As n — > oo, g — > 0, we have 
E[tpg(pi\r 1 ,s 1 )ip*(p 2 \r2,S2)] 

= E[w ri , sl (X 1 )(l> pi (X 1 -X 2 )}E[wr 2 ,s 2 (X 1 )$ P2 {X l -X 2 )} 

A t l,L(Pi)L(P2) 

n 2gP 1 +p 2 

X / f[{Wr 1 +r 2 ,s 1 +S2- f} (l) + {-l) P2 Wr 2 ,s 2 {Wn, S1 ■ f} {1) ](x) d X 
1 



//K«/ w +(-irK.f/} (,1) } 

X {^ 2 , S2 / (P2) + ("1) P2 {^ 2 , S2 • /} (P2) }(^) dx 

-AE[w ruSl {X 1 )cl )pi {X x -X 2 )]E[ Wr2 , S2 {X x ) ( j )p2 {X 1 -X 2 )} 

+ o(n- 1 +n- 2 g- pi - p2 ) 

for pi + p 2 odd, and 

E bPg(Pi\ri,si)ip*(p 2 \r 2 ,s 2 )] 

= E[w ri , sl (Xx)(f> pi (Xi-X 2 )]E[wr 2 ,s 2 (X 1 )^ P2 {X 1 -X 2 )} 

2 
+ 



+ 



n 2gPl+p 2 



1 



II 



r V(0|n + r 2 , Sl + s 2 ) / L( pi )(z)L( p2 )(z) dz 

/K 1 , sl / (pi) + (-i) pi {^ 1 ,, 1 -/} (pi) } 



X {^r 2 , S2 / (P2) +(-l) P2 {^r 2 , S2 -/} (P2) }(x)(ix 

- 4E7[«; riifll (Xi)0 Pl (Xi - Xa)]^^^^)^,^! - X 2 )} 
+ o(n- 1 + n-V Pl_P2 " 1 ) 
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for pi + p2 even, with both p\ and P2 being even, and 

E H>*g{Pl\ r l,Sl)i>* g {p2\r2,S2)} 

= E[w ri ,s 1 (Xl)&Pi(Xl-X2)]E[Wr 2 ,s 2 (Xl)(i> P2 (X 1 - X 2 )\ 
A t 2,L(Pi)L<P2) 



+ 



+ 



2 n 2 ff pi+p 2 
1 



T J /[{«Vi+ra,Ji.«a • /} (2) - ^r 2 , S2 {^n, S i ■ /} (2) ]0) dx 



ri 



x {u^ jaa /<»> + (-If 2 w 2lS2 • /} (P2) }(^) <& 

+ o(n- 1 +n-V Pl_P2 " 1 ) 
/or &o£/i pi and p 2 being odd. 

Hereafter, we adopt the notation 



v£5 



Br, 



1 \2 L f( x i- x j) w ( x i) 



n(n — 1) 



d f g > (x,e) yf g"(x,e) 
,{x) ae\g(x,e)i I g (x,e) 



--0o 



d 2 f g'(x,9) Yf g"(x,8) 



w ^ deae T \ g (x,9) J I g(x,e) 

The behavior of i/j g (p\r,s) is summarized in the next lemma. 

Lemma 4. As n — > oo, g — > 0, we have 

MSE$ g (p\r,s)} 

+ 2^-1 / f ^^ U ' 2 • ^^ + (-^M*)^ • /} (2) (^)} cfa 



1 
+ - 

n 



(P)( 



f(x){w(x)f (p > (s) + (-l) p {u; • /} w (a)}" dx 



-4E[r g (p\r,s)] 2 + E[A n ] T Z I E[A n ] 
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+ 2{E[w(X 1 )4>(X 1 - X 2 )(h + h)]} T E[A r , 

+ o(n- 1 +n- 2 g- 2p+1 ) 
for p odd and 

MSE$ g (p\r,s)] 

= ^IlHp + 2|r, s) 2 + - 1 A TT i?(L(p))^(0|2r, 2a) 



f(x){w(x)f ( - p \x) + (-l) p {w ■ f} {p \x)} 2 dx 
4E[r g (p\r,s)] 2 + E[A n ] T ^jE[A n ] 
+ 2{E[w(X 1 )<fi(X 1 - X 2 )(h + I 2 ))} T E[A n ) 



+ o(n~ 1 +n~ 2 g~ 2p - 1 ) 



for p even. 



Lemma 5. As n — > oo, g — > 0, we have 
E\fig(pi\ri,si)p,g(p2\r 2 ,S2)] 

= -r^l^ipi + 2 ln , si)4>(p 2 + 2\r 2 , s 2 ) 



A t l,Z,(Pi)L(P2) 

n 2gP!+P 2 



X / f[{w ri +r 2 ,s 1 +S2 ■ f} (1) + {- l ) P2w T2,s 2 {Wr 1 , Sl • f}^'](x) dx 

+ - [ //K,« • f {pi) + (-i) pl K 1)S1 • /} (P1) ] 

X K, S2 • /(«) + (-ir{w r2 , S2 ■ f} iP2) ]( X )dx 

- 4E[w ri , Sl (Xi)cP Pl (Xi - X 2 )]E[w r2 ,s 2 (X 1 ) ( j )p2 (X 1 - X 2 )\ 
+ E[w ruSl (X 1 )cf )pi (X 1 -X 2 )(I l +I 2 )} T E[A n (p 2 \r 2 ,s 2 )} 
+ E[w r2!S2 (X 1 )cf )p2 (X 1 -X 2 )(I 1 +I 2 )} T E[A n (p 1 \r 1 , Sl )} 

+ E[A n (p 1 \n,s l )] T 'ZiE[A n (p 2 \r 2 ,s 2 )] 

+ o{n- 1 +n- 2 g- pi - p2 ) 
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E[fi g (pi\ri, Si)flg(p2\r2, s 2 )] 

g 4 

= -J^L^iPi + 2 l r i » si)ip(p2 + 2|r 2 , s 2 ) 
2 ^0,L(Pi)L(P2) ,, i , 

__^_^(0|ri + r 2 , fl i + a 2 ) 
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rg 1. 



1 

+ - 

n 



/K, sl -/ (pi) + (-i) Pl W llS1 -/} (pi) ] 



x k, S2 . /(w) + (-ir{^, 2 , S2 • /} (P2) ](x)dx 



-4ek 1iS1 (Xi)0 p1 (Xi-x 2 )]^k, S2 (Xi)0 P2 (Xi-x 2 )] 

+ ^K 1 , S1 (X 1 )0 P1 (X 1 -X 2 )(h +/ 2 )] T ^[^„(p 2 |r 2 , S2 )] 
+ E[w r2 , S2 (X 1 )ct) P2 {X 1 -X 2 ){h +l2)] T E[A n (p 1 \r 1 ,s 1 )} 

+ E[A n (p 1 \r 1 ,s 1 )] T ^ I E[A n (p 2 \r 2 ,s 2 )} 

+ o(n~ 1 +n- 2 g- pi -P 2 - 1 ) 
for p\ even and p 2 even, and 
E[fig(pi\r 1 ,s 1 )il g (p2\r2,S2)} 

= -jl4,L^(Pi + 2 l r i> si)tp(p2 + 2|r 2 , s 2 ) 



+ 



A t 2,L(Pi)L(P2) 



T / /[{^r 1+ r 2 , Sl + S2 • /} (2) - Wr 2lS2 {Wr 1)Sl • /} (2) ](^) <& 



1 
+ - 

n 



2 n 2 9 pi+p 2 - 

>K liS1 -/^) + (-ir{^ llS1 -/} (pi) ] 

x K,, S2 • /(*») + (-iy*{ Wr2jS2 . f}W](x)dx 

- AE[w ruSl (X 1 )^ Pl (X 1 - X 2 )]E[wr 2>S2 (X 1 )4> P2 (X 1 - X 2 )\ 

+ E[w ri>Sl (X 1 )c/ )pi (X 1 -X 2 )(I 1 + I 2 )} T E[A n (p 2 \r 2 ,s 2 )} 
+ J BK 2 , S2 (X 1 )^ 2 (X 1 -X 2 )(/ 1 + / 2 )] T J BK(p 1 |r 1 ,si)] 

+ E[A n {p 1 \r l ,s 1 )} T H I 

+ o(n- 1 + n"V Pl " P2+1 ; 
for p\ odd and p 2 odd. 
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Proof of Theorem 2 (continued). By applying Lemmas 4 and 5 to 
(9.2) and rearranging, the MSE expression of M g is obtained. This completes 
the proof. □ 

Acknowledgments. I thank an Associate Editor and a referee for useful 
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